Minimizing Time When Applying Bootstrap to Contingency Tables Analysis of Genome-Wide Data

نویسندگان

  • Francesco Sambo
  • Barbara Di Camillo
چکیده

Bootstrap resampling is starting to be frequently applied to contingency tables analysis of Genome-Wide SNP data, to cope with the bias in genetic effect estimates, the large number of false positive associations and the instability of the lists of SNPs associated with a disease. The bootstrap procedure, however, increases the computational complexity by a factor B, where B is the number of bootstrap samples. In this paper, we study the problem of minimizing time when applying bootstrap to contingency tables analysis and propose two levels of optimization of the procedure. The first level of optimization is based on an alternative representation of bootstrap replicates, bootstrap histograms, which is exploited to avoid unnecessary computations for repeated subjects in each bootstrap replicate. The second level of optimization is based on an ad-hoc data structure, the bootstrap tree, exploited for reusing computations on sets of subjects which are in common across more than one bootstrap replicate. The problem of finding the best bootstrap tree given a set of bootstrap replicates is tackled with best improvement local search. Different constructive procedures and local search operators are proposed to solve it. The two proposed levels of optimization are tested on a real GenomeWide SNP dataset and both are proven to significantly decrease computation time.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analysis of Dynamic Longitudinal Categorical Data in Incomplete Contingency Tables Using Capture-Recapture Sampling: A case Study of Semi-Concentrated Doctoral Exam

Abstract. In this paper, dynamic longitudinal categorical data and estimation of their parameters in incomplete contingency tables are evaluated. To apply the proposed method, a study has been conducted on the data of the semi-concentrated doctoral exam of the National Organization for Educational Testing (NOET). The results of studies such as the obtained confidence intervals and calculating t...

متن کامل

Partial Association Components in Multi-way Contingency Tables and Their Statistiical Analysis

In analyses of contingency tables made up of categorical variables, the study of relationship between the variables is usually the major objective. So far, many association measures and association models have been used to measure  the association structure present in the table. Although the association measures merely determine the degree of strength of association between the study varia...

متن کامل

Bootstrap tests for independence in two-way ordinal contingency tables

For the analysis of an r by c contingency table having ordered row categories and ordered column categories, a bootstrap method is applied for the model-based likelihood ratio test for independence. A model-based likelihood ratio chi-square statistic and the statistic of the maximum eigenvalue of a Wishart matrix are also discussed. A simulation study is performed to compare the proposed method...

متن کامل

On Resampling for Statistical Con dentiality in Contingency Tables

Resampling schemes, and especially the bootstrap method, were proposed as a subclass of perturbation methods to ensure statistical conndentiality in statistical databases. Later, a method based on bootstrapping was presented to achieve the more speciic task of anonymising contingency tables. In this paper, we argue that the latter proposal is either ineecient from a computational point of view ...

متن کامل

TEAM: efficient two-locus epistasis tests in human genome-wide association study

As a promising tool for identifying genetic markers underlying phenotypic differences, genome-wide association study (GWAS) has been extensively investigated in recent years. In GWAS, detecting epistasis (or gene-gene interaction) is preferable over single locus study since many diseases are known to be complex traits. A brute force search is infeasible for epistasis detection in the genome-wid...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012